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ABSTRACT 

In this paper we investigate a new computing paradigm, called 
SocialCloud, in which computing nodes are governed by so- 
cial ties driven from a bootstrapping trust-possessing social 
graph. We investigate how this paradigm differs from exist- 
ing computing paradigms, such as grid computing and the 
conventional cloud computing paradigms. We show that in- 
centives to adopt this paradigm are intuitive and natural, and 
security and trust guarantees provided by it are solid. We 
propose metrics for measuring the utility and advantage of 
this computing paradigm, and using real-world social graphs 
and structures of social traces; we investigate the potential 
of this paradigm for ordinary users. We study several design 
options and trade-offs, such as scheduling algorithms, cen- 
tralization, and straggler handling, and show how they affect 
the utility of the paradigm. Interestingly, we conclude that 
whereas graphs known in the literature for high trust prop- 
erties do not serve distributed trusted computing algorithms, 
such as Sybil defenses — for their weak algorithmic proper- 
ties, such graphs are good candidates for our paradigm for 
their self-load-balancing features. 
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1. INTRODUCTION 

Cloud computing is a new paradigm of computing 
that overcomes the restriction of conventional comput- 
ing paradigms by enabling new technological and eco- 
nomical aspects, such as elasticity and pay-as-you-go — 
which free users from long-term commitments and obli- 
gation towards service providers. Cloud computing is 
beneficial for both consumers and cloud service providers. 
While it meets customers and users technological de- 
mands, the cloud computing paradigm is also a rich 
field of profit to cloud providers 0. 

For users, cloud computing overcomes several short- 
comings as opposed to using conventional computing 



paradigms; where the used infrastructure and software 
are owned by the user. For example, cloud computing 
enables users of the cloud — who also can be providers 
of services — to virtually locate their contents closers to 
their consumers and reduce latency of serving such con- 
tents, a challenging issue in conventional computing set- 
tings. Also, considering the return on investment, cloud 
computing has its appealing economical benefits and 
incentives, which make it a desirable option to many 
users. These incentives can be seen in the long run 
as a reduced overall cost resulting from hardware and 
software liabilities and maintenance costs in alternative 
paradigms [3]. As for providers, benefits are also eco- 
nomical in the absolute sense. 

The current conventional cloud computing paradigm 
has many benefits, despite posing several challenging 
issues that need to be addressed before wider adoption 
by many potential users [22]. Examples of these issues 
include the need for concrete and clear business model 
that outlines clearer service level agreements (SLA) and 
guarantees the rights of users ^Hi UHl [H] , the need for 
architectures that consider the variety of potential ap- 
plications demanded by users, the need for program- 
ming models that consider the large scale of data in 
the cloud, and the need for new applications that ben- 
efit from the architectural and programming models in 
the cloud, among other issues. While many of these is- 
sues are being constantly addressed in ongoing research 
efi'orts; where several architectures [HI HI [51], program- 
ming models [H [HIIT] , and applications [11 [Ml [23 
[CTl [TUl [m [55] are proposed, security and data privacy 
are chief among other issues to be considered before 
this paradigm is widely accepted. Indeed, both out- 
sider and insider threats to security and privacy of data 
in cloud systems are unlimited. Also, incentives do exist 
for cloud providers to make use of users' data residing in 
cloud for their own benefits, for the lack of regulations 
and enforcing policies. 

In this paper, we oversee a new type of computing 
paradigm, called SocialCloud, that enjoys parts of 
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the merits provided by the conventional cloud. Imag- 
ine the scenario of a computing paradigm where users 
who collectively construct a pool of resources perform 
computational tasks on behalf of their social acquain- 
tance. Our paradigm and model are similar in many 
aspects to the conventional grid-computing paradigm. 
It exhibits such similarities in that users can outsource 
their computational tasks to peers, complementarily to 
using friends for storage, which is extensively studied 
in literature. Our paradigm is, however, very unique in 
many aspects as well. Most importantly, our paradigm 
exploits the trust exhibited in social networks as a guar- 
antee for the good behavior of other "workers in the 
system" . Accordingly, the most important ingredient 
to our paradigm is the social bootstrapping graph, a 
graph that is used for recruiting workers for a social 
network. 

Indeed, social networks are very popular (c.f. ^S.U . 
This popularity of social networks has opened the door 
wide for investigating the potential of these networks 
for many applications. Problems that are unsolvable in 
the cyberspace are easily solvable using social networks, 
for that they possess both algorithmic properties — such 
as connectivity — and trust, which are used to reason 
about the behavior of honest users in the social network, 
and limit the misbehavior introduced by other malicious 
users supported by efficiency features. Most important 
to the context of our paradigm is the aggregate com- 
putational power of nodes in the social network. In- 
deed, beyond the nodes and social links, the social net- 
works consist of users with computing machines that 
are idle for most of the time [6]. Furthermore, owners 
of these computing machines are willing to share their 
computing resources for their friends, and for a different 
economical model than in the conventional cloud com- 
puting paradigm — fully altruistic one. This behavior 
makes our work share commonalities with an existing 
stream of work on creating computing services through 
volunteers [531 IS]- Our results hence highlight tech- 
nical aspects of this direction and pose challenges for 
designs options when using social networks for recruit- 
ing such workers and enabling trust. 

1.1 Contributions 

To this end, our contribution in this paper is mainly 
twofold: 

• First, we investigate the potential of the social 
cloud computing paradigm by introducing a design 
that bootstraps from social graphs to construct 
distributing computing services. We advocate the 
merits of this paradigm over existing ones such as 
the grid computing paradigm. 

• Second, we verify the potential of our paradigm us- 
ing simulation set-up and real-world social graphs 



with varying social characteristics that reflect dif- 
ferent, and possibly contradicting, trust models. 
Both graphs and the simulator are made public [40] 
to the community to make use of them, and im- 
prove by additional features. 

1.2 Organization 

The organization of this paper is as follows. In ^ 
we argue for the case of our paradigm. In we review 
the preliminaries of this work. In 21 we introduce the 
main design, including an intensive discussion on the 
design options. In Sj5l we describe our simulator used 
for verifying the performance aspects of our design. In 
ij6] we introduce the main results and detailed analy- 
ses and discussion of the design options, their benefits, 
and limitations. In [JTl we summarize some of the re- 
lated work, including work on using social networks for 
building trustworthy computing services. In [J8l we we 
draw concluding remarks followed by future work and 
directions in SJ31 

2. THE CASE FOR SocialCloud 

In this paper, we look at the potential of using un- 
structured social graphs for building distributed com- 
puting systems. These systems are proposed with sev- 
eral anticipated benefits in mind. First, such systems 
would exploit locality of data based on the applications 
they are intended for, under the assumption that the 
data would be stored at multiple locations and shared 
among users represented in the social network — see ^3A\ 
and 1531 for concrete examples of such applications. This 
is in fact not a far-fetched assumption. For example, 
consider a co-authorship social graph, like the one used 
in our experiments, where the SocialCloud is pro- 
posed for deployment. In that scenario, data on which 
computations are to be performed is likely to be at mul- 
tiple locations; on machines of research collaborators, 
co-authors, or previous co-authors. Even for some on- 
line social networks, the assumption and achieved ben- 
efits are not far-fetched as well, considering that friends 
would have similar interests, and likely to have contents 
replicated across different machines, which could be po- 
tentially of interest to use in our computing paradigm. 
Examples of such settings include photos taken at par- 
ties, videos — for image processing applications, among 
others. 

The second advantage of this paradigm is its trust- 
worthiness. In the recent literature, there has been a lot 
of interest in the distributed computing community for 
exploiting social networks to perform trustworthy com- 
putations. Examples of these literature works include 
exploiting social networks for cryptographic signing ser- 
vices [55], Sybil defenses [58l [HI El], and routing in 
many settings including the delay tolerant networks [T] 
[TT] . In all of these cases, along with the algorithmic 
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property in these social networks, the built designs ex- 
ploit the trust in social networks. The trust in these 
networks rationalizes the assumption of collaboration 
in these built system, and the tendency of nodes in the 
network to act according to the intended protocol with 
the theorized guarantees. Same as in all of these appli- 
cations, SocialCloud tries to exploit the trust aspect 
of the social network, and thus it is easy to reason about 
the behavior of nodes in this paradigm (c.f. §3.3|) . 

Related to trust exhibited in the social fabric uti- 
lized in our paradigm, the third advantage is that it 
is also easy to reason about the recruitment of work- 
ers. In this context, workers are nodes that are will- 
ing to perform computing tasks for other nodes (tasks 
outsourcers). This feature, when associated with the 
aforementioned trust, is quite advantageous when com- 
pared to the challenge of performing trustworthy com- 
puting on dedicated workers in the conventional grid- 
computing paradigm, where it is hard to recruit such 
workers. 

Finally, our design oversees an altruistic model of So- 
CIAlCloud, where nodes participate in the system and 
do not expect in return. Further details on this model 
are in ^3.31 

Grid Computing. While the SocialCloud uses a 
similar paradigm to that of the grid computing paradigm- 
in the sense that both try to outsource computations 
and use high aggregate computational resources, the 
SocialCloud is slightly different. In particular, in the 
SocialCloud, there is a pre-defined relationship be- 
tween the task outsourcer and the computing worker, 
which does not exist in the grid-computing paradigm. 
We limit the computations to 1— hop neighbors, which 
further improve trustworthiness of computations in our 
model. 

3. ASSUMPTIONS AND SETTINGS 

In this section, we review the preliminaries required 
for understanding the rest of this paper. In particu- 
lar, we elaborate on the social networks, their popular- 
ity, and their potential for being used as bootstrapping 
tools for systems, services, and protocols. We describe 
the social network formulation at a high level, the eco- 
nomical aspect of our system, and finally, the attacker 
model. 

3.1 Systems on Social Networks 

Social networks are so popular. Nine of the twenty 
most popular sites on the web are for social network- 
ing |24| . The top ten online social networking web- 
sites have more than 650 million of unique visitors per 
month in total. The most popular social network. Face- 
book [25j alone serves 250 million unique visitors per 
month, with more than 96 unique visitors per second. 
Such popularity of social networks has motivated so 



many designs, protocols, and applications on top of so- 
cial networks. Examples include routing [71 [TTl [20l [37] . 
social gossip [1, i26» JJj , and Sybil defenses [58] (c.f. ^ . 
While they are different in the details of their opera- 
tion, all of these designs and protocols weigh algorith- 
mic properties (connectivity), trust, and collaboration 
in the underlying social networks, which are used for 
bootstrapping such systems. 

3.2 Social Graphs — High Level Description 

In this paper we view the social network as an undi- 
rected and unweighted graph G = {V,E), where V = 
{wi, . . . , Vn} is the set of vertexes, representing the set 
of nodes in the social graph, and correspond to users (or 
computing machines), and E = {e^} (where 1 < z < n 
and 1 < j < n) \s the set of edges connecting those 
vertices — which implies that nodes associated with the 
social ties are willing to perform computations for each 
other. \V\ — n denotes the size of G and \E\ ~ m 
denotes the number of edges in G. In the rest of the 
paper, social network, network, and graph are used in- 
terchangeably to refer to both the physical computing 
network and the underlying bootstrapping social graph, 
and the meaning depends on the context. Also, we refer 
to computing entities associated with users in the social 
network as nodes. 

3.3 Economics of SocialCloud 

In our design we assume an altruistic model, which 
simplifies the behavior of users and arguments on the 
attacker model. In this altruistic model, users in the 
social network donate their computing resources — while 
not using them — to other users in the social network to 
use them for specific computational tasks. In return, 
the same users who donated their resources for others 
would anticipate others as well to perform their compu- 
tations on behalf of them when needed. 

One can further improve this model. Social networks 
are rich of trust characteristics that capture additional 
features, and can be used to rationalize this model in 
several ways. For example, trust in social networks, a 
well studied vein of research in this context [38], can 
be used to adjust this model so as users would bind 
their participation in computations to trust values that 
they assign to other users. In this work, in order to 
make use of and confirm this model, we limit outsourced 
computations at 1-hop. 

While we do not consider that in this paper, another 
model using interests and groups is worth mentioning 
for its popularity and potential as a future work. The 
incentives model can be further relaxed by enabling "in- 
terest" based model of computation where workers do 
computation to other nodes in the graph that only share 
some interest with them. This interest can be publicly 
identified by the membership of a node in a group. In- 



3 



vestigating this model is left as a future work. 

3.4 Use Model and Applications 

For our paradigm, we envision compute intensive ap- 
plications, for which other systems have been developed 
in the past using different design principles, but lacking 
trust features; where trust is needed in such applications 
and provided by our paradigm. These systems include 
ones with resources provided by volunteers, as well as 
grid- like systems, like in Condor [35], MOON [31], Neb- 
ula [HEi], and SETI@Home [2]. 

Specific examples of applications built on top of these 
systems, that would as well fit to our use model, include 
blog analysis [53| , web crawling and social- network ap- 
plications (collaborative filtering, image processing, etc) [TT1 . 
scientific computing [52], among others. 

Notice that each of these applications requires cer- 
tain levels of trust for which social ties are best suited 
as a trust bootstrapping and enabling tool. Especially, 
reasoning about the behavior of systems and expected 
outcomes (in a computing system in particular) would 
be well-served by this trust model. We notice that this 
social trust has been previously used as an enabler for 
privacy in file-sharing systems [30j . anonymity in com- 
munications systems [42] , and collaboration in sybil de- 
fenses [53] [571 [3S] , among others. In this work, we use 
the same insight to propose a computing paradigm that 
relies on such trust and volunteered resources, in the 
form of shared computing time. With that in mind, in 
the following section we elaborate on the attacker used 
in our system and trust models provided by our design, 
thus highlight its advantage and distancing our work 
from prior works in the literature. 

3.5 Attacker Model 

In this paper, as it is the case in many other systems 
built on top of social networks [57, 58, 49], we assume 
that the attacker is restricted in many aspects. For ex- 
ample, the attacker has a limited capability of creating 
arbitrarily many edges between himself and other nodes 
in the social graph. 

While this restriction may contradict some recent re- 
sults in the literature [8] — where it is shown that some 
legitimate users befriend random users in the social net- 
work who are potentially attackers, it can be relaxed to 
achieved the intended trust and attack model by consid- 
ering an overlay of subset of friends of each users. This 
overlay expresses the trust value of the social graph well 
and eliminates the influence introduced by the attacker 
who infiltrated the social graph [35] ■ For example, since 
each user decides on to which node among his adjacent 
nodes to outsource computations to, each user is aware 
of other users he knows well and those who are just 
social encounters that could be potential attackers. Ac- 
cordingly, the user himself decides whether to include 



a given node in his overlay or not, thus minimizing or 
eliminating harm and achieving the required trust and 
attack model. 

The description of the above attacker model might 
be at odds with the rest of the paper, especially that 
we use some online social networks that do not reflect 
characteristics of trust required in our paradigm. How- 
ever, such networks, when used, are used for two rea- 
sons. First, to derive insight on the potential of such 
social networks, and others that share similar topologi- 
cal characteristics, for performing computational tasks 
according to the method devised in this paper. Second, 
we use them to illustrate that some of these social net- 
works might be less effective than the trust-possessing 
social graphs, which we strongly advocate for our com- 
puting paradigm. 

3.6 Trust in Grid Computing Systems 

While there has been a lot of research on charac- 
terizing and improving trust in the conventional grid 
computing paradigm [U jS] SB] [31] — which is the closest 
paradigm to compare to ours, trust guarantees in such 
paradigm are less strict than what is expressed by so- 
cial trust. For that, it is easy to see that some nodes in 
the grid computing paradigm may act maliciously by, 
for example, giving wrong computations, or refusing to 
collaborate; which is even easier to detect and tolerate, 
as opposed to acting maliciously |13j . 

4. THE DESIGN OF SOCIALCLOUD 

The main design of SocialCloud is very simple, 
where complexities are hidden in design choices and 
options. In SocialCloud, the computing overlay is 
bootstrapped by the underlying social structure. Ac- 
cordingly, nodes in the social graph act as workers to 
their adjacent nodes (i.e., nodes which are one hop away 
from the outsourcer of computations). An illustration 
of this design is depicted in Figure |TJ In this design, 
nodes in the social graph, and those in the Social- 
Cloud overlay, use their neighbors to outsource com- 
putational tasks to them. For that purpose, they utilize 
local information to decide on the way they schedule the 
amount of computations they want each and every one 
of their neighbors to take care of. Accordingly, each 
node has a scheduler which she uses for deciding the 
proportion of tasks that a node wants to outsource to 
any given worker among her neighbors. Once a task is 
outsourced to the given worker, and assuming that both 
data and code for processing the task are transferred to 
the worker, the worker is left to decide how to schedule 
the task locally to compute it. Upon completion of a 
task, the worker sends back the computations result to 
the outsourcer. 
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4.1 Design Options: Scheduling Entity 

In the SocialCloud, two schedulers are used. The 
first scheduler is used for determining the proportion of 
task outsourced to each worker and the second sched- 
uler is used at each worker to determine how tasks out- 
sourced by outsourcers are computed and in which or- 
der. While the latter scheduler can be easily imple- 
mented locally without impacting the system complex- 
ity, the decision used for whether to centralize or de- 
centralize the former scheduler impacts the complexity 
and operation of the entire system. In the following, we 
elaborate on both design decisions, their characteristics, 
and compare them. 

4.1.1 Decentralized scheduler 

In our paradigm, we limit selection of workers to 
1-hop from the outsourcer. This makes it possible, 
and perhaps plausible, to incorporate scheduling of out- 
sourcing tasks at the side of the outsourcer in a decen- 
tralized manner — thus each node takes care of schedul- 
ing its tasks. On the one hand, this could reduce the 
complexity of the design by eliminating the schedul- 
ing server in a centralized alternative. However, on the 
other hand, this could increase the complexity of the 
used protocols and the cost associated with them for 
exchanging states — such as availability of resources, on- 
line and offline time, among others. All of such states 
are exchanged between workers and outsourcers in our 
paradigm. These states are essential for building ba- 
sic primitives in any distributed computing system to 
improve efficiency (see below for further details). An 
illustration of this design option is shown in Figure [TJ 
In this scenario, each outsourcer, as well as worker, has 
its own separate scheduling component. 




Figure 1: A depiction of the main SociAL- 
Cloud paradigm as viewed by an outsourcer of 
computations. The different nodes in the social 
network act as workers for their friends, who act 
as potential jobs/tasks outsourcers. The links 
between social nodes are ideally governed by 
a strong trust relationship, which is the main 
source of trust for the constructed computing 
overlay. Both job outsourcers and workers have 
their own, and potentially different, schedulers. 



4. 1.2 Centralized Scheduler 

Despite the fact that nodes may only require their 
neighbors to perform the computational tasks on behalf 
of them and that may require only local information — 
which could be available to these nodes in advance, the 
use of a centralized scheduler might be necessitated to 
reduce communication overhead at the protocol level. 

For example, in order to decide upon the best set of 
nodes to which to outsource computations, a node needs 
to know which of its neighbors are available, among 
other statistics. For that purpose, and given that the 
underlying communication network topology may not 
necessarily have the same proximity of the social net- 
work topology, the protocol among nodes needs to incur 
back and forth communication cost. 

One possible solution to the problem is to use a cen- 
tralized server that maintains states of the different 
nodes. Instead of communicating directly with neighbor 
nodes, an outsourcer would request the best set of can- 
didates among its neighbors to the centralized schedul- 
ing server. In response, the server will produce a set 
of candidates, based on the locally stored states. Such 
candidates would typically be those that would have 
the most available resources to handle the outsourced 
computation task. 

An illustration of this design option is shown in Fig- 
ure[2j In this design, each node in SocialCloud would 
periodically send states to a centralized server. When 
needed, an outsourcer node contacts the centralized server 
to return to it the best set of candidates for outsourcing 
computations, which the server would return based on 
the states of these candidates. Notice that only states 
are returned to the outsourcer, upon which the out- 
sourcer would send tasks to these nodes on its own — 
Thus, the server involvement is limited to the control 
protocol. 




Figure 2: The decentralized model of task 
scheduling in SocialCloud. 

The communication overhead of this design option to 
transfer states between a set of d nodes is 2d, where d 
messages are required to deliver all nodes' states and 
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d messages are required to deliver states of all other 
nodes to each node in the set. On the other hand, 
d{d — 1) messages are required in the decentralized op- 
tion (which requires pairwise communication of states 
update). When outsourcing of computations is possible 
among all nodes in the graph, this translates into 0(n) 
for the centralized versus 0{n^) communication over- 
head for the decentralized option. To sum up. Table [1] 
shows a comparison between both options. 

Table 1: A comparison between the centralized 
and decentralized scheduler options. Compared 
features are resistance to failure, communication 
overhead, required additional hardware, and re- 
quired additional trust. F stands for failure, C 
stands for communication, H stands for hard- 
ware, and T stands for trust. 



Option 


F 


c 


H 


T 


Centralized 
Decentralized 


✓ 


0{n) 
0(n2) 


n 

✓ 


✓ 



4.2 Tasks Scheduling Policy 

While the use of distributed or centralized scheduling 
entity resolves the issue of scheduling at the outsourcer 
side, two decisions remain unsolved: how much com- 
putation to outsource to each node (worker), and how 
much time a node among these workers should spend on 
a given task for a certain outsourcer. We handle these 
two issues separately. 

As mentioned earlier, any off-the-shelf scheduling al- 
gorithm can be utilized to decide the right scheduling 
policy at the side of the outsourcer, which can be further 
improved by incorporating trust characterization mod- 
els for weighted job scheduling [55]. On the other hand, 
for workers scheduling, we consider several scheduling 
options as follows (notice that all of these policies are 
applied with respect to "computing time" . This further 
requires estimating the time required for each task as a 
first step for using these policies). 

• Round Robin (RR) Scheduling Policy. This 
is the simplest policy to implement, in which a 
worker spends an equal share of time on each out- 
sourced task in a round robin fashion among all 
tasks he has. 

• Shortest First (SF) Scheduling Policy. The 

worker performs shortest task first. 

• Longest First (LF) Scheduling Policy. The 

worker performs longest task first. 

Notice that we omit a lot of details about the underlying 
computing infrastructure, and abstract such infrastruc- 



ture to "time sharing machines" , which further simpli- 
fies much of the analysis in this work. In the results, 
we experiment with the three scheduling policies. 

4.3 Handling Outliers 

The main performance criterion used for evaluating 
SocialCloud is the time required to finish computing 
tasks for all nodes with tasks in the system. Accord- 
ingly, an outlier (also called a computing straggler) is 
a node with computational tasks that take a long time 
to finish, thus increasing the overall time to finish and 
decreasing the performance of the overall system. De- 
tecting outliers in our system is simple: since the total 
time is given in advance, outliers are nodes with com- 
puting tasks that have longer time to finish when other 
nodes participating in the same outsourced computa- 
tion are idle. 

Our method for handling outliers is simple too: when 
an outlier is detected, we outsource the remaining part 
of computations on all idle nodes neighboring the orig- 
inal outsourcer. For that, we use the same scheduling 
policy used by the outsourcer when she first outsourced 
this task. In the simulation part, we consider both sce- 
narios of handled and unhandlcd outliers, and observe 
how they affect the performance of the system. 

4.4 Deciding Workers Based on Resources 

In real- world deployment of a system like SociAL- 
Cloud, we expect heterogeneity of resources, such as 
bandwidth, storage, and computing power, in workers. 
This heterogeneity would result in different results and 
utilization statistics of a system like SocialCloud, de- 
pending on which nodes are used for what tasks. 

While our work does not address this issue, and leaves 
it as a future work (c.f. %.6\ and SJS]). We further be- 
lieve that simple decisions can be made in this regard 
so as to meet the design goals and achieve the good 
performance. For example, we expect that nodes would 
select workers among their social neighbors that have 
resources and link capacities exceeding a threshold, thus 
meeting an expected performance. 

5. SIMULATOR OF SocialCloud 

To demonstrate the potential of SocialCloud as a 
computing paradigm, we implement a batch-based sim- 
ulator [40 that considers a variety of scheduling algo- 
rithms, an outlier handling mechanism, job generation 
handling, and failure simulation. A fiow diagram of the 
simulator is in Figure |3l 

The flow of the simulator, which represents the flow 
of the system, is depicted in Figure [3] First, the node 
factory uses the bootstrapping social graph to create 
nodes and their workers. Each node then decides on 
whether she has a task or not, and if she has a task she 
schedule the task according to her scheduling algorithm. 
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If needed, each node then transfers code on which com- 
putations are to be performed to the worker along with 
the sphts of the data for these codes to run on. Each 
worker then performs the computation according to the 
scheduhng algorithm of the worker and returns the re- 
sults of the computations to the outsourcer. 
Timing. In SocialCloud, we use virtual time to 
simulate computations and resources sharing. We scale 
down the simulated time by 3 orders of magnitude of 
that in reality. This is, for every second worth of com- 
putations in real-world, we use one millisecond in the 
simulation environment. Thus, units of times in the rest 
of this paper are in virtual seconds. 
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Figure 3: The flow diagram of SocialCloud: so- 
cial graph is used for bootstrapping the com- 
puting service and recruit workers, nodes are 
responsible for scheduling their tasks by deter- 
mining the amount of work each of its neighbors 
would process, and each worker (node) uses its 
local scheduler to determine how much time is 
allowed for each sub-task by its neighbors. 

6. RESULTS AND ANALYSIS 

In this section, in order to derive insight on the po- 
tential of SocialCloud, we experiment with the sim- 
ulator described above. Before getting into the details 
of the experiments, we describe the data and evaluation 
metric used in this section. 

6.1 Evaluation Metric 

To demonstrate the potential of operating Social- 
Cloud, we use the "normalized finishing time" of a 
task outsourced by a user to other nodes in the So- 
cialCloud as the performance metric. We consider 
the same metric over the different graphs used in the 
simulation. To demonstrate the performance for the 
population of all nodes that have tasks to be computed 
in the system, we use the empirical CDF (commutative 
distribution function) as an aggregate measure. For a 
random variable X, the CDF is defined as Fx{x) — 
Pr{X < x). In our experiments, the CDF measures the 
fraction (or percent) of nodes that finish their tasks be- 
fore a point in time part of the overall number of 
tasks. We define x as the factors of time of normal op- 
eration per dedicated machines, if they were to be used 



instead of outsourcing computations. This is, suppose 
that the overall time of a task is Ttot and the time it 
takes to compute the subtask by the slowest worker is 
Tiast, then x for that node is defined as Tiast/Ttot- 

6.2 Tasks Generation and Weights 

Also for demonstrating the operation of our simula- 
tor, and the trade-off that such operation provides, we 
consider two different approaches for the tasks gener- 
ated by each user. The size of each generated task is 
measured by virtual units of time, and for our demon- 
stration we use two different scenarios: 

• Constant task weight. each outsourcer gen- 
erates tasks with an equal size. These tasks are 
divided into equal shares and distributed among 
different workers in the computing system. The 
size of each task is T. 

• Variable task weight, each outsourcer has a 
different task size. We model the size of tasks as a 
uniformly distributed random variable in the range 
of [f - £,f + i] for some f > i. Each worker 
receives an equal share of the task from the out- 
sourcer. 

6.3 Deciding Tasks Outsourcers 

Not all nodes in the system are likely to have tasks to 
outsource for computation at the same time. Accord- 
ingly, we denote the fraction of nodes that have tasks 
to compute by p, where < p < 1. In our experiments 
we use p from 0.1 to 0.5 with increments of 0.1. We fur- 
ther consider that each node in the network has a task 
to compute with probability p, and has no task with 
probability 1 — p — thus, whether a node has a task to 
distribute among its neighbors and compute or not fol- 
lows a binomial distribution with a parameter p. Once 
a node is determined to be among nodes with tasks at 
the current round of run of the simulator, we fix the 
task length. For tasks length, we use both scenarios 
mentioned in t j6.21 with fixed or constant and variable 
tasks weights. 

6.4 Social Graphs 

To derive insight on the potential of SocialCloud, 
we run our simulator on several social graphs with dif- 
ferent size and density, as shown in Table [2j The graphs 
used in these experiments represent three co-authorship 
social structures (DBLP, Physics 1, and Physics 2), one 
voting network (of Wiki-vote for wikipedia administra- 
tors election), and one friendship network (of the con- 
sumer review website, Epinion). All of these graphs 
are made undirected, if they are not already, which ra- 
tionalizes their use in our system. Notice the varying 
density of these graphs, which also reflects on varying 
topological characteristics. Also, notice the nature of 
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these social graphs, where they are built in different so- 
cial contexts and possess varying qualities of trust [S^ . 

Next, we present the main results and findings of our 
design when operated on these graphs. 

Table 2: Social graphs used in our experiments. 



Dataset 


# nodes 


# edges 


Description 


DBLP 


614981 


1155148 


CS Co-authorship 


Epinion 


75877 


405739 


Friendship network 


Physics 2 


11204 


117649 


Co-authorship 


Wiki-vote 


7066 


100736 


Voting network 


Physics 1 


4158 


13428 


Co-authorship 



6.5 Main Results 

In this section we demonstrate our paradigm and dis- 
cuss the main results of this work. Due to the lack of 
space, we delegate additional results to the technical re- 
port in [39) . For all measurements, our metric of perfor- 
mance and comparison is the normalized time to finish 
metric, explained in section [6. II 



6. 5. 1 Performance When Varying the Number of Out- 
sourcers 

In the first experiment, we run our SocialCloud sim- 
ulator on the different social graphs discussed earlier to 
measure the evaluation metric when the number of the 
outsourcers of tasks increases. We consider p — 0.1 to 
0.5 with increments of 0.1 at each time. The results of 
this experiment are in Figure ID On the results of this 
experiment we make several observations. 

First, we observe the potential of SocialCloud, even 
when the number of outsourcers of computations in the 
social network is as high as 50% of the total number of 
nodes, which translates into a small normalized time 
to finish even in the worst performing social graphs 
(about 60% of all nodes with tasks would finish in 2 
normalized time units). However, this advantage varies 
for different graphs: we observe that sparse graphs, 
like co-authorship graphs, generally outperform other 
graphs used in the experiments (by observing the ten- 



dency in the performance in figures 4(a) through 4(c^ 
versus figures [4(d)| and 4(e)). In the aforementioned 



graphs, for example, we see that when 10% of nodes 
in each case is used, and by fixing a;, the normalized 
time, to 1, the difference of performance is about 30%. 
This difference of performance is observed between the 
Physics co-authorship graphs — where 95% of nodes fin- 
ish their computations — and the Epinion graph — where 
only about 65% of nodes finish their computations. 

Second, we observe that the impact of p, the fraction 
of nodes with tasks in the system, would depend on the 



X — 1) leads to a decrease in the fraction of nodes that 
finish their computations from 95% to about 75%. On 
the other hand, for the same settings, this would lead 
to a decrease from about 80% to 40%, a decrease from 
about 65% to 30%, and a decrease from 70% to 30% 
in DBLP, Epinion, and Wiki-vote, respectively. This 
suggests that the decreases in the performance are due 
to an inherit property of each graph. The inherit prop- 
erty of each graph and how it affects the performance 
of SocialCloud is further illustrated in Figure [S] In- 
terestingly, we find that even if DBLP is almost two 
orders of magnitude the size of Wiki-vote, for exam- 
ple, it outperforms Wiki-vote when not using outlier 
handling, and gives almost the same performance when 
using outliers handling. 

6. 5. 2 Performance with different scheduling policies 

Now, we turn our attention to measuring and under- 
standing the impact of the different scheduling policies 
discussed in > 14.2l on the performance of SocialCloud. 
We consider the different datasets in Table [51 and use 
p = 0.1 to 0.5 with 0.2 increments (the results are shown 
in Figure [5]). The observed consistent pattern in almost 
all figures in this experiment tells that shortest first 
policy always outperforms the round robin scheduling 
policy, whereas the round robin scheduling policy out- 
performs the longest first. This pattern is consistent 
regardless of p and the outlier handling policy. 

The difference in the performance when using differ- 
ent policies can be as low as 2% (when p = 0.1 in physics 
co-authorship; shown in Figure [8'(b)| and as high as 70% 
(when using p = 0.5 and outlier handling as in wiki-vote 



graph rather than p alone. For example, in Figure 4(a) 
we observe that moving from p = 0.1top = 0.5 (when 



(figure 6(o) )). The patterns are made clearer in Figure[6] 
by observing combinations of parameters and policies. 

We finally notice that, despite the difference in the 
performance of SociALCLOUDwhen using different poli- 
cies, it still result in reasonable normalized finishing 
time to all users, suggesting its practicality against the 
measured metric with different parameters. 

6.5.3 Performance with Outliers Handling 

Outliers, as defined in il4.31 drag the performance of 
the entire system down. However, as pointed out ear- 
lier, handling outliers is quite simple in SocialCloud if 
accurate timing is used in the system. Here we con- 
sider the impact of the outlier handling policy explained 
in il4.3l on the aggregate performance for the entire sys- 
tem. The impact of using the outlier handling policy 
can be also seen on Figure [51 which is used for demon- 
strating the impact of using different scheduling policies 
as well. In this figure, we see that the simple handling 
policy we proposed improves the performance of the 
system greatly in all cases. 

More specifically, the improvement in the performance 
differs depending on other parameters, such as p, and 
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Figure 4: The normalized time it takes to perform outsourced computations in SocialCloud. Differ- 
ent graphs with different social characteristics have different performance results, where those with 
well-defined social structures have self-load-balancing features, in general. These measurements are 
taken with round-robin scheduling algorithm that uses the outlier handling policy in !j4.3l for a fixed 
task size (of 1000 simulation time units). 



the scheduling policy. As with the scheduling policy, 
the improvement can be as low as 2% and as high as 
more than 60%. When p is large, the potential for im- 
provement is high — see, for example, p = 5 in Physics 
2 (in Figure |6l) with the round robin scheduling policy 
where almost 65% improvement is due to outlier han- 
dling when X = \. 

6.5.4 Performance with Variable Task Size 

In all of the above experiments, we considered com- 
putational tasks of fixed size; 1000 of virtual time units 
in each of them. Whether the same pattern would be 
observed in tasks with variable size is unclear. Here we 
experimentally address this concern by using variable 
duty size that is uniformly distributed in the interval 
of [500, 1500] time units. The results are shown in Fig- 
ure m Comparing these results to the middle row of 
Figure |5] (for the fixed size tasks), we make two obser- 
vations, (i) While the average task size in both sce- 
narios is same, we observe that the performance with 
variable task size is worse. This performance is antic- 
ipated as our measure of performance is the time to 
finish that would be definitely increased as some tasks 
with longer time to finish are added, (ii) The same pat- 
terns advantaging a given scheduling policy on another 
are maintained as in earlier with fixed task length. 

6. 5. 5 Relationship Between Structure and Performance 

It is worth noting that the performance of SociAL- 



Cloud is quite related to the underlying structure of 
the social graph. For example, sparse graphs such as co- 
authorship graphs — which are pointed out in |38| to be 
slow mixing graphs — are the graphs with performance 
advantage in SocialCloud. These graphs, in particu- 
lar, are shown to possess a nice trust value that can 
be further utilized for SocialCloud. Furthermore, 
this trust value is unlikely to be found in online so- 
cial networks which are prone to infiltration, making 
the case for trust-possessing graphs even stronger, as 
they achieve performance guarantees as well. This, in- 
deed, is an interesting finding by itself, since it shows 
opposite outcomes to what is known in the literature on 
the usefulness of these graphs — see Sj3]and more details, 
see [38] . 

6.6 Additional Features and Limitations 

Our simulator of SocialCloud omits a few details 
concerning the way a distributed system behaves in re- 
ality. In particular, our measurements do not report on 
or experiment with failure. However, our simulator is 
equipped with functionality for handling failure in the 
same way used for handling outliers (c.f §4.3|) . Further- 
more, our simulator considers a simplistic scenario of 
study by abstracting the hardware infrastructure, and 
does not consider additional resources consumed, such 
as memory and I/O resources. In the future, we will 
consider equipping our simulator with such functional- 
ities and see how this affects the behavior and benefits 



9 



1.0 

0.8 - 

0.6 

0.4 

0.2 

0.0 



Physics 1 

Physics 2 

Epinion 

Wii<i-vote 

DBLP 



0.5 1 1.5 2 2.5 3 3.5 

Time (normalized) 

(a) Handled outliers {p — 0.1) 



0.6 

0.4 

0.2 
0.0 







— •*-r-^,4.._,_.Tt,.,ffi;.7.'.T!li.T!.'.ff.l'* 








4 is.''' 


Physics 1 








Physics 2 
Epinion 
Wil<i-vote 
DBLP 







1 





0.5 1 1.5 2 2.5 3 3.5 

Time (normalized) 

(d) Unandled outliers {p = 0.1) 



1.0 

0.8 

U. 0.6 
Q 

O 0.4 
0.2 
0.0 

































'/ • 


,•' 




Physics 
Physics 2 
Epinior 
Wiki-vote 
DBLF 






- gi>'' 


















0.5 1 1.5 2 2.5 3 3.5 

Time (normalized) 

(b) Handled outliers (p — 0.3) 



1.0 

0.8 

LL 0.6 
Q 

O 0.4 

0.2 
0.0 























i ,* 


■ Physt^'.l-"^' 

.^rPriysics 2 - 

Epinion 

Wiki-vote - 

DBLP 













(e) 



) 0.5 1 1.5 2 2.5 3 3.5 

Time (normalized) 

Unhandled outliers (p = 0.3) 




Physics 1 

Physics 2 

Epinion 

Wil<i-vote 

DBLP 



0.5 1 1.5 2 2.5 3 3.5 

Time (normalized) 

(c) Handled outliers (p = 0.5) 



1.0 
0.8 
0.6 
0.4 
0.2 
0.0 







































, •PTiysics 1 

Physics .2. -;,5i- 
..■■eppon"""--- 

.,Wi|iii-vote 

DBLP 

























0.5 1 1.5 2 2.5 3 3.5 

Time (normalized) 

(f) Unhandled outliers (p = 0.5) 



Figure 5: The performance of SocialCloud on the different social graphs used for our experiments, 
demonstrating the inherent differences in the different social graphs. Both figures use p — 0.3 and the 
round robin scheduling algorithm. 



of SocialCloud. 

One last concern related to our demonstration of our 
paradigm is that we do not consider the heterogeneity 
of resources, such as bandwidth and resources, in nodes 
acting as workers in the system. Furthermore, we did 
not consider how this affects the usability of our system 
and what decision choices this particular aspect of dis- 
tributed computing systems would have on the utility 
of our paradigm. While this would be mainly a future 
work to consider (c.f. §??), we expect that nodes would 
select workers among their social neighbors that have 
resources and link capacities exceeding a threshold, thus 
meeting an expected performance outcome. 

7. RELATED WORK 

There have been many papers on the use of social 
networks for building communication and security sys- 
tems, studying the performance of such designs on top 
of social networks, and analyzing the assumptions used 
in these designs as well. Below we highlight a few ex- 
amples of these efforts and works. 

Systems built on top of social networks include file 
sharing systems |30) . anonymous communication sys- 
tems [501142] Sybil defenses [M |33l [Ml [58] , referral and 
filtering systems (32l [44] , and live streaming |35] . Most 
of these applications weigh the trust in social graph, 
and an algorithmic property that makes the operation 
of these systems on top of social network effective. An- 
other set of applications that exploit social networks' 
trust is routing |7l [17l [201 [37] — in several settings, where 
it has been shown that connectivity in social graphs can 
be of benefit in disconnected networks. Finally, assump- 



tions of social network-based systems are explored re- 
cently, where Sybil defenses and their assumptions are 
studied in [41], and trust is challenged in [38] . 

Perhaps the closest vein of related work in the liter- 
ature to our work is on the use of social networks for 
building computing services. Until the time of writing 
this work, most of the prior research work has been 
solely focused on providing storage services, but not 
a platform of computations. Such storage services use 
slightly different economical model from SocialCloud's 
model, where payment per Megabyte per month rates 
are used as opposed to our eco-system. Examples of 
such efforts are reported by Sato [45] and Tran et al. ^48j ) . 
Xu et al. [55] have further explored a first step in the 
direction of building cloud computing platforms on top 
of social networks where by considering the access con- 
trol model in this domain with preferred access control 
guarantees. The results of this work can be used as a 
building block in our work to improve the quality of 
access control and authorization. 

With similar flavor of distributed computing services 
design, there has been prior works in literature on us- 
ing volunteers' resources for computations exploiting 
locality of data jl4| [53], examination of programing 
paradigms, like MapReduce [21] on such paradigm [34l 
111] . Finally, our work shares several commonalities with 
the grid and volunteer computing systems [3S1 [M] [TH 
[53l [2], of which many aspects are explored in the lit- 
erature. Trust of grid computing and volunteer-based 
systems is explored in ^ |5l |46l [3ll [23" . Applications 
built on top of these systems, that would fit to our use 
model, are reported in [531 [U [52] , among others. 
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Figure 6: The normalized time it takes to perform outsourced computations in SocialCloud for 
different scheduling policies. Naming convention: U stands for unhandled outlier and B stands for 
handled outliers (Balanced). RRS, SFS, and LFS stand for round-robin, shortest first, and longest 
first scheduling. We fix the job size among all outsourcers. 



8. CONCLUDING REMARKS 

In this paper we have introduced the design of So- 
CIAlCloud, a distributed computing service that re- 
cruits computing workers from friends in social net- 
works and use such social networks that characterize 
trust relationships to bootstrap trust in the proposed 
computing service. We further advocated the case of 
such computing paradigm for the several advantages it 
provides. 

To demonstrate the potential of our proposed design, 
we used several real-world social graphs to bootstrap 
the proposed service and demonstrated that majority of 
nodes in most cases would benefit computationally from 
outsourcing their computations to such service. We con- 
sidered several basic distributed system characteristics 
and features, such as outlier handling, scheduling de- 
cisions, and scheduler design, and show advantages in 
each of these features and options when used in our 
system. 



To the best of our knowledge, this is the first and only 
work in literature that bases such design of computing 
paradigm on volunteers recruited from social networks 
and tries to bring the trust factor from these networks 
and use it in such systems. This characteristic distances 
our work from the prior work in literature that uses 
volunteers' resources for computations [Ml [53] . 

Most important outcome of this study, along with the 
proposed design, is the relationship exposed between 
the social graphs and the behavior of the built comput- 
ing service on top of them. In particular, we have shown 
that social graphs that possess strong trust characteris- 
tics as evidenced by face-to- face interaction [38], which 
are known in the literature for their poor characteris- 
tics prohibiting their use in applications (such as Sybil 
defenses [T51 [571 [SS] ) , have a self-load-balancing charac- 
teristics when the number of outsourcers are relatively 
small (say 10 to 20 percent of the overall population 
on nodes in the computing services). That is, the time 
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Figure 7: The normalized time it takes to perform outsourced computations in SocialCloud for 
different scheduling policies. Naming convention: U stands for unhandled outlier and B stands for 
handled outliers (Balanced). RRS, SFS, and LFS stand for round-robin, shortest first, and longest 
first scheduling. We sets jobs with variable lengths as described above. 



it takes to finish tasks originated by a given fraction 
of nodes in such graph, and for the majority of these 
nodes, ends in a relatively short time. 

On the other hand, such characteristics and advan- 
tages are maintained even when the number of out- 
sourcers of computations is as high as 50% of the nodes, 
contrary to the case of other graphs with dense struc- 
ture and high connectivity known to be proper for the 
aforementioned applications. This last observation en- 
courages us to investigate further scenarios of deploy- 
ment of our design. We anticipate interesting find- 
ings based on the inherit structure of such deployment 
contexts — since such contexts may have different social 
structures that would affect the utility of the built com- 
puting overlay. 



9. FUTURE WORK 

In the future we will look at two directions. In the 
first direction, we aim to complete the missing ingredi- 
ent of the simulator and enrich it by further scenarios 
of deployment of our design, under failure, with dif- 
ferent scheduling algorithms at both sides of the out- 
sourcer and workers (in addition to those discussed in 
this work), and to consider other overhead characteris- 
tics that might not be in line with topological character- 
istics in the social graph. These characteristics may in- 
clude the uptime, downtime, communication overhead, 
and I/O overhead consumption, among others. One 
interesting feature that we will consider is trust-based 
scheduling, benefiting from the prior work in |38| . 

In the second direction, we will turn our attention 
from the simulation settings to real-world deployment 
settings, thus addressing options discussed in tj6.61 and 
to implement a proof-of-concept application, among those 
discussed in i l3.4( by utilizing design options discussed 
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Figure 8: The normalized time it takes to perform outsourced computations in SocialCloud, for 
variable task size. 



in this paper. We anticipate a lot of hidden complex- 
ities in the design to arise, and significant findings to 
come out of the deployment that we will report on in 
the future work. 
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